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Abstract: A great deal of inference in statistics is based on making the approximation that a 
statistic is normally distributed. The error in doing so is generally 0(n~ l l 2 ) and can be very 
considerable when the distribution is heavily biased or skew. This note shows how one may reduce 
this error to 0(n-^' +1 )/ 2 ), where j is a given integer. The case considered is when the statistic is 
the mean of the sample values from a continuous one-parameter distribution, after the sample has 
undergone an initial transformation. 
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1 Introduction and summary 



Given a random sample of size n, the usual confidence interval for the population mean is of the 
order C^n" 1 / 2 ). The aim of this note is to show how a more accurate confidence interval of the 
order 0(n~^ +1 ^ 2 ), where j is a given integer, can be obtained. This is an important problem 
because finding accurate confidence intervals for the population mean is an everyday problem faced 
by many scientists and engineers. 

Suppose for some statistic ip n , Y n {6) = n l / 2 {'^) n — g(9)}/a(9) — > N(0, 1) as n — > oo, where 
9 € R. Cornish and Fisher (1937) and Fisher and Cornish (1960) obtained an Edgeworth type 
expansion for the distribution of Y n (9), and an asymptotic expansion for its percentile points when 
this distribution is parameter-free. Withers (1984) gave a simplified version of their results which 
reduced the labor of their application. Withers (1983) considered the more general case, where the 
distribution of Y n {9) does depend on 9. In this case a parameter-free transformation V nx (-) was 
given such that 



provided 



V nxi (ipn) < 9 < V nX2 (ifj n ) with probability I - a (1.1) 



$ (xi) - $ (x 2 ) = 1 - a, e.g. xi = -x 2 = ^ L (1 - a/2) , (1.2 
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assuming g(-) is an increasing function, where $(•) denotes the distribution of a standard normal 
random variable. If g(-) is decreasing the inequalities in (jl.ip are reversed. 

In Section 2, we show how this theory applies to ip n = X n: the mean of X\, . . . , X n i.i.d. F(x, 9) 
on i?, where F(x, 9) is of known parametric form. 

In applications {Xi} will generally not be the original observations {Yi}, say, but will be given 
by Xi = h(Yi), where h(-) is a transformation chosen from considerations of efficiency, robustness 
or ease of computation of the first few cumulants of F(x, 9) as functions of 9. So, if Y\ ~ R(x, 9) 
then 

F(x,9) = R{hr l (x),9) (1.3) 

for h(-) one to one increasing. 

However, {Yi} need not lie in R. Their distribution may, in fact, depend on parameters, A, other 
than 9 provided F(x, 9) does not depend on A. Note that 9 itself may be a reparameterisation of 
an original parameter. 

In Section 3 the efficiency and robustness of this class of procedures is considered. 

In Section 4 this theory is applied to the 'Lehmann alternative': F(x,9) = R(x) 9 , where by 
suitable choice of h(-), R(-) may be any continuous distribution. 

For many parameter inference problems see Withers (1989). 



2 The general case 

Let X\, . . . ,X n be a random sample for a distribution F(x,9) on R such that g{9) = EX\ is a 
known one-to-one function from the parameter space, assumed to be some subset of R. Set 

a{9) 2 = varXi, T n = X n - g(9), Y n {9) = n 1/2 T n /a(9). 

For any real random variable X set 

K r (X) = rth cumulant of X, 

£ r (X) = K 2 (X)- r/2 K r (X) - 6 r , 2 , (2.1) 

where J ns = 1 if r = s and <5 rjS = if r ^ s. Suppose that for some j > 0, n r {Xi) exists for 
1 < r < j + 2 and 



lim sup 
t— >±oo 



exp(itx)dF(x, 9) 



< 1. (2.2) 



This condition rules out many discrete lattice distributions. Then by Theorem 3, page 541 of Feller 
(1971), 



j 



P{Y n {9) <x) = $(x) - (f){x)J2 n ~ r/2u r( x ) + o(n- j/2 j as n oo 



uniformly in x, where U r (x) is a polynomial in x defined in terms of £i, . . . , l r +2, where £ r = l r {X\). 
Note that l\ = £2 = 0. (U r = R r +2 is defined by Feller). In particular, by equation (6.50) of Stuart 
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and Ord (1987), 

Ui = £ 3 H 2 /6, 

U 2 = £ A H 3 /24 + £ 2 3 H 5 /72, 

U 3 = e 5 H A /m + £ 3 £ 4 H 6 /1U + ^# 8 /1296, 

Ui = £ 6 H 5 /720 + £\H 7 /llh2 + £ 3 £ 5 H 7 /720 + £l£ 4 H 9 /l728 + £ 4 #n/31104, 

where H r is the Hermite polynomial: H r (x) = exp(x 2 /2)(— d/dx) r exp(— x 2 /2). For example, 
Hi, . . . , Hio are given by equation (6.23) of Stuart and Ord (1987). Cornish and Fisher (1937) used 
this to show (for a more general situation but assuming all cumulants exist) that 

P {Y n (9) < x) = $ (£„(*)) = $ {£ nj {x)) + o (n-^/ 2 ) , 

where 

00 j 

£ n (x) = X- ^2n~ r/2 f r (x), injix) = X - ^ Tl~ r/2 f r (x) 
r=l r=l 

and f r (x) is a polynomial in x depending on £1, . . . ,£ r + 2 : 

f r (x) = r a r (x), 



where 



0i = 4, = (4, -4) , 03 = (4, -44,^1) > 

0a = (4,-44, £ 2 3 h, -4) ( 2 -3) 



and 



ai(x) = H 2 /6, a 2 (x) = (H 3 (x)/2A, (4x 3 - 7x) /36)' 

a 3 (x) = (# 4 (a;)/120, (llx 4 - 42x 2 + 15) /144, (69x 4 - 187x 2 + 52) /W)', 
a 4 (x) = (# 5 (x)/720, (5x 5 - 32x 3 + 35x) /384, (7x 5 - 48x 3 + 51x) /360, 

(lllx 5 - 547x 3 + 456x) /864, (948x 5 - 3628x 3 + 2473x) /7776)'. 

They further showed that if z a = <3? -1 (l — a) then 

Yn (#) < Vn (z a ) with probability 1 — a, 

where 

00 

= y + ^2n~ r/2 gr(y) 

r=l 

and g r (y) is a polynomial in x depending on 4, • • • , 4+2 : 

g r (y) = P'Mv), (2-4) 
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where 

h(y) = H 2 (y)/6, 

b 2 (y) = (H 3 (y)/24, (2y 3 - by) /36)', 

63 (y) = (# 4 (y)/120, (y 4 - 5y 2 + 2) /24, (l2y 4 - 53y 2 + 17) /324)', 
& 4 (y) = (if 5 (y)/720, (3y 5 - 24y 3 + 29yz) /384, (2y 5 - 17y 3 + 21y) /180, 
(I4y 5 - 103y 3 + W7y) /288, (252y 5 - 1688y 3 + 1511y) /7776^ , 

and 55, g% may be obtained from V, VI, pages 214, 215 of Fisher and Cornish (1960), by setting 
a = b = 0, c = £3, d = £4, e = £5, etc. 

So, under (Q, if 

cr = cr(9), K3 (Xi) , . . . , Kj+2 C^i) exist and do not depend on 9 (2-5) 

then 

G- l {l-a) = G-](l-a) + o(n-^l 2 ), 

where 

i 

G n (x) = P (X n - g(9) < x) , G"/(l - a) = rT 1 / 2 ^ (z a ) , = y + ^ n" r / 2 5r (y). 

r=l 

In particular, the confidence interval 

fT 1 (*„ - G"/(a/2)) < 9 < g- 1 (x n - G~}(1 - a/2)) (2.6) 

has level 1 — a + o(n~-?/ 2 ), in fact 1 — a + 0(n~^ +1 >' 2 ) if Kj+s(X{) is finite. More generally, if (|2.5p 
is weakened to allow cr{9) to vary with 9 and Y n (-) is one to one increasing (or decreasing), and x±, 
x 2 satisfy (|1.2p . then a confidence interval with level 1 — a + 0(n~( J+1 )/ 2 ) is 

^ (r/ni (x 2 )) < 8 < Y- 1 ( Vnj (si)) (2.7) 

with the inequalities reversed if 5^(0 is decreasing. 

These formulae have been shown to be extremely accurate. One can judge the number of 
significant places when approximating G~ l by G~j by the size of the successive terms n~ r / 2 g r (z a ), 
which generally alternate in sign. See, for example, Fisher and Cornish (1960). 

Withers (1983) - for a more general situation - showed how to obtain a confidence interval for 
9 in the more usual situations, where the cumulants depend on 9. This dependency is expressed by 
writing g r (x) = g r (x,9), etc. 

The main purpose of the present note is to apply these results to the case of the sample mean 
under the assumptions (|2.ip . (|2,2p . When the initial transformation h(-) is independent of 9 then 
applying Withers (1983) to X n , one obtains that a confidence interval of level 1 — a + o(n ) is 

V nxi j (X n ) < 9 < V nX2 j {Xn) j (2-8) 
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where V nxj (t) = g 1 (S nxj (t)), x\, x 2 satisfy ([L~2"]l . and 



S nxj (t) = t + Y,n- i/2 Qi(t) 



(2.9) 



for given by Withers (1983) in terms of Pi(t) = a(g (t))g^i(x, g _1 (i)) for {gi} as above, 

where go(x, 6) = x. Here, we have assumed g(-) to be increasing. If g(-) is decreasing the inequalities 
in (|2.8p are reversed. 

In particular, Q 1 = -P u Q 2 = -P 2 - PiQi and Q 3 = -P 3 - P 2 Qi - PiQj/2. 

For such calculations it is convenient to write P{ in the form 



where Mi(t) = rrii(g 1 (t)), rrii(9) = cr(6?)/3j_i, (3q = 1 and bo(x) = x with {(3 r ,K} given by (|2.3|) . 
(p^j) . Setting K r {9) = n r (Xi), one obtains 



and so forth, where D t = d/dt. 

3 Efficiency and robustness 

So far our concern has been to obtain accurate inference on the parameter of the original distribution 
R(x, 6) from the size of X n , where Xi = h(Yi) and h(-) is a given transformation. We now consider 
the efficiency of the procedure, and its robustness to outliers, as these factors are important in the 
choice of h(-). 

Let F n be the empirical distribution of {1^}, which we shall suppose lie in R s . Corresponding 
to (|2.8p is the point estimate 



Pi{t) = Mi(t)%-i(x), i > 1, 



m x (9) = a(9), m 2 {6) = a(8)- 2 K 3 (9), 

m 3 (ey = {a(er 3 K 4 (e),a(9)- 5 K 3 (e) 2 ) 



etc, and so 



Qi(t) 
Q 2 (t) 
Qs(t) 



Mi(t)x, 

M 2 {t)b l {x) + x 2 D t M l {tf/2 J 

M 3 (t)'b 2 (x) + xbi(x)D t Mi(t) - x 3 AMi(t) 3 /6, 



(2.10) 
(2.11) 
(2.12) 



0n = 9(F n ) 



where 0(F) = g 1 (J hdf) and g(-) is fixed by the choice of h(-): 




The influence function of n is 
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which evaluated at F(-) = R(-,9) gives Ig(x) = (h(x) — g(9))/g(9). So, to reduce the effect of 
outliers it is desirable that h(-) be bounded. Also 



n 1 / 2 (e n -e\ ^M(o,v(0,h)) 

as n — > oo, where 

V(9, h) = J l e {xf dR(x, 9)= (J h(x) 2 dR(x, 9) - g(9) 2 ^ /g{9) 2 = a{9) 2 jg{9) 2 . (3.1) 

The asymptotic efficiency of 9 n or of the confidence interval (|2.6p is inversely proportional to 
V(9, h). Note that V(9, h) is minimized by h = q$, where qg is Fisher's score function 

q e {x) = d/d9 log dR(x,9)/dR{x,0). 

The maximum likelihood estimate #* is asymptotically equivalent to this choice in the sense that 
n 1 / 2 (0* — 9 n ) A as n — > oo. However, the results of Section 2 have assumed h(-) is independent 
of 9, so can only be applied to 9* n when qo(x) has the form a{9)b(x). 



4 Lehmann's alternative 

In this section, we illustrate the results of Sections 2 and 3 when the original sample \Y{\ has 
distribution R(x, 9) = Fq(x)® , where 9 > 0, and Fq(-) is a continuous distribution. This is sometimes 
known as 'Lehmann's alternative'. By (|1.3j) . {Xi = h(Yi)} have distribution F(x,9) = R{x) e , 
where R(x) = Fo(h~ l (x)). So, by suitable choice of h(-), R(-) may be chosen to be any continuous 
distribution on R. The cumulative generating function for X\ is 

K R (t) = log J exp(tx) dR(x) e . 

However, it is sometimes easier to calculate the necessary cumulants directly. 

The maximum likelihood estimate is given by #* = -l n 1 , where h{x) = logFo^x). This yields 

Example 4.1 Suppose R(x) = exp(x) on (— oo,0]. Then Kn{t) = — log(l + K r (Xi) = 

(-9)~ r (r - g{9) = -9~\ a(9) 2 = -9 2 and l r = (-)'(r - 1)! for r > 2. So, £„, r, n , g r , f r , are 
independent of 9. Also Y n {9) = (9X n + l)?^ 1 / 2 . So, by equation {2.7\ ) a confidence interval with 
level 1 - a + 0(n-^' +1 )/ 2 ) is 

N nj (X 2 ) I | X n \ > 9 > N nj ( Xl ) /\X n \, 

where N n j(x) = 1 — n^ l ^ 2 rj n j(x) = 1 — J2i=i n ~^ 2 9i-i( x )> where {gi} are given by 112. 3\) . {2.1$ . 
In this particular example, one may use n9\X n \ ~ T(x,n) and hence 2n9\X n \ ~ xin t° obtain a 
confidence interval directly. The expansion L n {x) = n + V^Yll=i n ^ l ^ 2 9i-i( x ) f or Xn is given by 
equation (3a) of Fisher and Cornish (1960). So, < L2 n {—x) with probability 1— <!>(x)+0(n -7 / 2 ). 
It follows that for this example in terms of (3a), N u q(x) = L2 n (—x)/(2n). □ 
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Example 4.2 Suppose R(x) = x v on [0,1], where v > is a given. This corresponds to h(x) = 
Fo(x) l l v . 

So, it will be both less efficient and less robust than the choice of Example 4-1- However, it serves 
to illustrate the method when rj n (-) depends on 6. In this case the cumulants are best calculated 
from EX[ = (1 + r^)" 1 , where i/t = (i/0) _1 . So, g{9) = (1 + V)" 1 lies in [0, 1]. Set t = g{6). Then 

a{9) 2 = (1 + 2i>)~ 1 - t 2 = t(l - t) 2 {2 - t)-\ 

K 3 (9) = (1 + Zip)' 1 - 3(1 + 2ipy l t + 2t 3 = t(3 - 2t)' 1 - 3t 2 (2 - t)' 1 + 2t 3 , 
and so forth. By \2.10\l - l2.12\l . Q\ is given in terms of 

M 1 (t) = (l-t)(2t- 1 -l)~ 1/2 (4.1) 

and Q2 is given in terms of 

M 2 (t) = 2(1 — — 2t)/(3 — 2t) (4.2) 

and 

D t M x (t) 2 /2 = (2 - t)~ 2 - t = (2 - t~ 2 ) (1 -t)(l-3t + t 2 ) . (4.3) 
Finally, by \2. 8\) . a confidence interval of level 1 — a + 0(n _1 ) is given by 

(S nxil (Xn)' 1 - l)' 1 jv < 9 < (S nX2l (Xn)- 1 - l^j' 1 /u, 

where S nxl (t) is given by (Ety) . WJWi . V2~TT\) and gjp-ggi. By (KW> the asymptotic effi- 

ciency of this choice is {V(0,h) for the maximum likelihood estimate }/V(0,h) = 9 2 g{9) 2 /o~(6) 2 = 
l-(v9 + l)~ 2 . □ 
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